Project 1-Final Report

-Darshit Doshi and Jared Kohler

We downloaded the Data from SafeGraph https://shop.safegraph.com/ with a coupon code aimed at data science students.

About Safegraph: "We obtain a variety of information (collectively the “Information”) from trusted third-party data partners such as mobile application developers. We collect this Information primarily through APIs, which are interfaces through which these app developers can provide us with information about their users. We sometimes collect the Information through other delivery methods, such as software development kits (“SDKs”) that are embedded directly into mobile apps."

The dataset we downloaded using is composed of several related tables with information about Cafes and Coffee shops in San Francisco, CA.
The first of these tables is the Core table, containing site information about each of the businesses contained in the dataset.

In [1]:
import pandas as pd
import fiona, os, shapely,json, copy
from shapely.geometry import shape, mapping
import geopandas as gpd
from geopandas import GeoSeries, GeoDataFrame
from IPython.display import Image
In [2]:
core = pd.read_csv('Data/CA-722515-CORE_POI-2019-09-27.csv')
In [3]:
core.T
Out[3]:
0 1 2 3 4 5 6 7 8 9 ... 649 650 651 652 653 654 655 656 657 658
safegraph_place_id sg:10c639a044e74f8bb976ace7d13429ed sg:eb7b5998f21447ba89dd8ab35a582dfe sg:118c95bf3c4e4370920d7a026fd3ebbe sg:d9f425fc18a24414b05a053442d1e408 sg:de408f31b2cd4eb783cc5de4ad6ed651 sg:5fe148c19cfc41efa64cd3c592b9e7a1 sg:fcc6a71553c04140863220456ff2f4e6 sg:a8248377476f4f40ac9eeaef65279570 sg:58be0fc40bc54869aad901c0a58458f7 sg:521f82bd4a1c40d4b672f0fb9330d052 ... sg:164441533dff4d1491c1b3314e8e0095 sg:cafcf3d5150c4316bcaa47fc8cfe1516 sg:207dcbc17d744443b4630ada782dda8c sg:44dc11faf62b44958a1522501f29e978 sg:f2baba072da4419a874f6c63d42897b4 sg:889ebdb0bc654fd19ea41ff5966b3c8c sg:13e339c34a3647da86d10b5205ca4c1f sg:3157d7aaa1db48448d40e6572884c1a3 sg:97c1bb1cbf5e4ff28860222a7490670d sg:efce8278589840149aa7f340f80f7640
parent_safegraph_place_id NaN NaN NaN NaN NaN sg:15e54773cb984527b7549b969bfed6fe NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
location_name La Boulangerie De San Francisco Hayes Valley The Cafe Royal Ground Coffee Chestnut Street Coffee Roastery Wholesome Bakery Humphry Slocombe Ice Cream Project Juice Happy Donuts Sightglass Coffee Muddy Waters Coffee House ... The Posh Bagel Karma Cafe Angel Cafe & Deli Cafe Murano Womply Cafe Claddagh Coffee Starbucks 3 19 Coffee Xpresso Tea Java Source Market
safegraph_brand_ids NaN NaN NaN NaN NaN NaN SG_BRAND_e534bc70bdac768a69613385bd2f4f31 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN SG_BRAND_f116acfe9147494063e58da666d1d57e NaN NaN NaN
brands NaN NaN NaN NaN NaN NaN Project Juice NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Starbucks NaN NaN NaN
top_category Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places ... Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places Restaurants and Other Eating Places
sub_category Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars ... Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars Snack and Nonalcoholic Beverage Bars
naics_code 722515 722515 722515 722515 722515 722515 722515 722515 722515 722515 ... 722515 722515 722515 722515 722515 722515 722515 722515 722515 722515
street_address 500 hayes street 2369 market street 5301 geary boulevard 2331 chestnut street 299 divisadero street 1 ferry building ferry plaza farmers market 2234 chestnut street 2101 van ness avenue 3014 twentieth street 521 valencia street ... 4 embarcadero center 1901 hayes street 700 geary street 1777 steiner street 180 townsend street 951 geneva avenue 3735 buchanan 3359 26th street 6901 geary boulevard 500 second avenue
city san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco ... san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco
state ca ca ca ca ca ca ca ca ca ca ... ca ca ca ca ca ca ca ca ca ca
zip_code 94102 94114 94121 94123 94117 94111 94123 94109 94110 94110 ... 94111 94117 94109 94115 94107 94112 94123 94110 94121 94118
phone_number 1.4154e+10 1.41578e+10 NaN 1.41593e+10 1.41534e+10 NaN 1.41537e+10 1.41557e+10 1.41564e+10 1.41586e+10 ... 1.41599e+10 1.4157e+10 1.41593e+10 NaN NaN 1.41547e+10 1.41535e+10 1.41596e+10 1.41554e+10 NaN
open_hours NaN { "Mon": [["0:00", "2:00"]], "Tue": [["18:00",... { "Mon": [["6:30", "22:00"]], "Tue": [["6:30",... NaN { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... { "Mon": [["11:00", "21:30"]], "Tue": [["11:00... { "Mon": [], "Tue": [], "Wed": [], "Thu": [], ... { "Mon": [["5:30", "20:00"]], "Tue": [["5:30",... { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... { "Mon": [["6:00", "23:00"]], "Tue": [["6:00",... ... { "Mon": [["6:00", "18:00"]], "Tue": [["6:00",... NaN NaN { "Mon": [["6:30", "17:00"]], "Tue": [["6:30",... { "Mon": [], "Tue": [["4:00", "4:30"]], "Wed":... { "Mon": [["6:00", "19:00"]], "Tue": [["6:00",... { "Mon": [["5:00", "17:30"]], "Tue": [["5:00",... { "Mon": [["8:00", "14:00"]], "Tue": [["8:00",... { "Mon": [["11:00", "22:00"]], "Tue": [["11:00... NaN

14 rows × 659 columns

The main information from this table that is useful to our analysis is the open_hours column.

In [4]:
core.open_hours.loc[2]
Out[4]:
'{ "Mon": [["6:30", "22:00"]], "Tue": [["6:30", "22:00"]], "Wed": [["6:30", "22:00"]], "Thu": [["6:30", "22:00"]], "Fri": [["6:30", "22:00"]], "Sat": [["6:30", "22:00"]], "Sun": [["6:30", "22:00"]] }'

The second table is called the Patterns table, and it contains most of the useful information. Using the same primary key for each business as in the last table, this table contains many additional dimensions about traffic to the business.

The table loaded below includes foot traffic data for a one month period between August and September of this year (2019).

In [5]:
patterns = pd.read_csv('Data/CA-722515-PATTERNS-2019_08-2019-09-27.csv')
In [6]:
patterns.head()
Out[6]:
safegraph_place_id location_name street_address city state zip_code brands date_range_start date_range_end raw_visit_counts ... visitor_work_cbgs visitor_country_of_origin distance_from_home median_dwell bucketed_dwell_times related_same_day_brand related_same_month_brand popularity_by_hour popularity_by_day device_type
0 sg:eb7b5998f21447ba89dd8ab35a582dfe The Cafe 2369 market street san francisco ca 94114 NaN 1564617600 1567296000 163 ... {} {"US":138} 5986 39.0 {"<5":4,"5-20":59,"21-60":35,"61-240":56,">240... {"Subway":18,"In-N-Out Burger":17,"Shell Oil":... {"Starbucks":40,"Subway":32,"Chevron":24,"Targ... [62,51,13,3,4,4,4,4,5,6,5,9,10,14,21,20,19,14,... {"Monday":7,"Tuesday":7,"Wednesday":9,"Thursda... {"android":45,"ios":101}
1 sg:118c95bf3c4e4370920d7a026fd3ebbe Royal Ground Coffee 5301 geary boulevard san francisco ca 94121 NaN 1564617600 1567296000 50 ... {} {"US":22} 2082 683.5 {"<5":0,"5-20":8,"21-60":8,"61-240":2,">240":32} {} {"Target":42,"Starbucks":37,"Walgreens":25,"Ch... [33,33,32,32,33,32,32,30,18,7,6,4,3,1,0,0,0,7,... {"Monday":8,"Tuesday":9,"Wednesday":8,"Thursda... {"android":8,"ios":15}
2 sg:d9f425fc18a24414b05a053442d1e408 Chestnut Street Coffee Roastery 2331 chestnut street san francisco ca 94123 NaN 1564617600 1567296000 129 ... {} {"US":109} 3687 22.0 {"<5":6,"5-20":52,"21-60":38,"61-240":27,">240... {"76":33,"Staples":16,"Denny's":11} {"Starbucks":52,"Chevron":33,"Peet's Coffee an... [4,4,4,4,4,4,7,8,11,11,18,16,14,16,19,23,22,11... {"Monday":5,"Tuesday":18,"Wednesday":20,"Thurs... {"android":40,"ios":70}
3 sg:de408f31b2cd4eb783cc5de4ad6ed651 Wholesome Bakery 299 divisadero street san francisco ca 94117 NaN 1564617600 1567296000 180 ... {} {"US":94} 4156 53.0 {"<5":2,"5-20":61,"21-60":31,"61-240":40,">240... {"Safeway":25,"Project Juice":20,"Lucky Superm... {"Starbucks":34,"Peet's Coffee and Tea":23,"Wa... [36,35,35,37,37,37,42,47,46,38,49,53,56,52,38,... {"Monday":19,"Tuesday":14,"Wednesday":17,"Thur... {"android":45,"ios":62}
4 sg:fcc6a71553c04140863220456ff2f4e6 Project Juice 2234 chestnut street san francisco ca 94123 Project Juice 1564617600 1567296000 361 ... {"060750601001":6,"060750117001":5,"0607506150... {"US":213} 7893 21.0 {"<5":12,"5-20":167,"21-60":92,"61-240":62,">2... {"Tacolicious":14,"Peet's Coffee and Tea":14,"... {"Starbucks":61,"Walgreens":32,"Target":28,"Ch... [19,17,19,19,19,18,20,23,33,46,49,60,67,53,53,... {"Monday":41,"Tuesday":47,"Wednesday":46,"Thur... {"android":53,"ios":171}

5 rows × 23 columns

In [7]:
patterns.T
Out[7]:
0 1 2 3 4 5 6 7 8 9 ... 525 526 527 528 529 530 531 532 533 534
safegraph_place_id sg:eb7b5998f21447ba89dd8ab35a582dfe sg:118c95bf3c4e4370920d7a026fd3ebbe sg:d9f425fc18a24414b05a053442d1e408 sg:de408f31b2cd4eb783cc5de4ad6ed651 sg:fcc6a71553c04140863220456ff2f4e6 sg:a8248377476f4f40ac9eeaef65279570 sg:58be0fc40bc54869aad901c0a58458f7 sg:521f82bd4a1c40d4b672f0fb9330d052 sg:3f5759808277493ca55953724d4d6a21 sg:4c2cb694bb594c839459344674fcad1d ... sg:2b2e8d4dce0340cdbf0214f83efc0744 sg:164441533dff4d1491c1b3314e8e0095 sg:cafcf3d5150c4316bcaa47fc8cfe1516 sg:44dc11faf62b44958a1522501f29e978 sg:f2baba072da4419a874f6c63d42897b4 sg:889ebdb0bc654fd19ea41ff5966b3c8c sg:13e339c34a3647da86d10b5205ca4c1f sg:3157d7aaa1db48448d40e6572884c1a3 sg:97c1bb1cbf5e4ff28860222a7490670d sg:efce8278589840149aa7f340f80f7640
location_name The Cafe Royal Ground Coffee Chestnut Street Coffee Roastery Wholesome Bakery Project Juice Happy Donuts Sightglass Coffee Muddy Waters Coffee House Reveille Coffee Co Truck Chocolate Chair ... Latte Express The Posh Bagel Karma Cafe Cafe Murano Womply Cafe Claddagh Coffee Starbucks 3 19 Coffee Xpresso Tea Java Source Market
street_address 2369 market street 5301 geary boulevard 2331 chestnut street 299 divisadero street 2234 chestnut street 2101 van ness avenue 3014 twentieth street 521 valencia street 768 sansome street 1520 webster street ... 448 beach street 4 embarcadero center 1901 hayes street 1777 steiner street 180 townsend street 951 geneva avenue 3735 buchanan 3359 26th street 6901 geary boulevard 500 second avenue
city san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco ... san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco san francisco
state ca ca ca ca ca ca ca ca ca ca ... ca ca ca ca ca ca ca ca ca ca
zip_code 94114 94121 94123 94117 94123 94109 94110 94110 94111 94115 ... 94133 94111 94117 94115 94107 94112 94123 94110 94121 94118
brands NaN NaN NaN NaN Project Juice NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Starbucks NaN NaN NaN
date_range_start 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 ... 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600 1564617600
date_range_end 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 ... 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000 1567296000
raw_visit_counts 163 50 129 180 361 216 172 148 11 2201 ... 301 2369 105 224 108 305 183 35 155 277
raw_visitor_counts 146 23 110 108 224 134 154 129 8 1737 ... 191 1544 62 139 81 234 144 24 120 117
visits_by_day [4,6,12,9,1,1,1,5,11,8,7,0,1,4,7,10,13,10,4,3,... [1,2,1,3,1,0,2,1,3,0,1,0,5,2,1,0,0,1,2,1,0,1,1... [2,0,3,5,3,5,5,7,6,7,4,1,3,4,4,3,9,1,1,3,6,7,2... [7,5,5,13,4,3,3,6,5,9,10,3,4,2,4,7,6,11,7,6,5,... [8,11,11,8,7,14,10,11,8,18,18,11,10,8,5,12,10,... [7,8,4,4,4,8,12,10,6,5,6,4,3,4,4,7,2,6,8,7,10,... [5,3,9,8,3,7,1,8,7,3,9,2,7,4,2,3,14,3,4,7,4,4,... [2,3,8,7,5,3,4,4,5,6,3,2,2,2,7,4,10,4,10,6,4,2... [0,0,1,2,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0... [52,60,136,124,58,59,64,54,79,110,107,66,65,38... ... [9,6,15,6,15,11,4,7,14,13,18,12,15,6,8,8,10,11... [74,106,62,54,52,77,98,95,78,71,47,82,86,101,9... [1,1,1,2,2,2,1,1,3,3,3,4,0,6,8,3,6,3,8,5,5,4,4... [9,4,4,7,6,6,2,5,10,10,7,1,8,5,5,7,5,12,10,5,6... [3,3,1,1,7,4,5,5,3,4,5,6,4,7,3,3,1,1,8,2,4,1,5... [8,14,12,11,9,11,7,7,7,9,7,7,8,10,9,11,12,8,7,... [2,7,7,8,3,4,8,3,4,8,11,5,7,5,1,4,5,7,6,7,8,2,... [1,2,1,2,2,0,1,0,0,2,1,2,0,1,1,1,2,2,0,2,0,2,1... [3,1,4,3,4,2,4,6,7,5,7,3,6,2,6,6,6,5,2,7,10,6,... [7,12,2,19,8,5,8,12,8,7,17,10,6,8,7,3,3,11,7,9...
visitor_home_cbgs {} {} {"060750127002":9,"060750601001":5} {"060750167004":10,"060750167003":5} {"060750126011":9,"060750126021":8,"0607501250... {"060750131011":7} {"060750228012":10,"060750228011":5} {} {} {"060750161003":11,"060750159001":10,"06075060... ... {} {"060750105002":18,"060750615003":13,"06075060... {"060750165003":8} {"060750155002":12} {} {"060750261002":9,"060750260041":9,"0607502600... {"060750126021":7,"060750126022":6,"0607506010... {"060750209002":6} {} {"060750451001":11}
visitor_work_cbgs {} {} {} {} {"060750601001":6,"060750117001":5,"0607506150... {} {"060750615001":5,"060750228011":5} {} {} {"060750615001":21,"060816023001":11,"06075012... ... {} {"060750117001":85,"060750615001":76,"06075010... {} {} {"060750615005":6} {} {} {} {"060750478013":7} {}
visitor_country_of_origin {"US":138} {"US":22} {"US":109} {"US":94} {"US":213} {"US":130} {"US":146} {"US":123} {"US":7} {"US":1686} ... {"US":180} {"US":1468,"CA":5} {"US":52} {"US":122} {"US":72} {"US":224} {"US":141} {"US":18} {"US":98} {"US":90}
distance_from_home 5986 2082 3687 4156 7893 12043 6929 7368 10081 24819 ... 723499 31389 5677 7713 24248 2064 10117 4360 4629 7162
median_dwell 39 683.5 22 53 21 22 39 39.5 87 22 ... 41 56 47 30.5 20.5 12 10 768 10 38
bucketed_dwell_times {"<5":4,"5-20":59,"21-60":35,"61-240":56,">240... {"<5":0,"5-20":8,"21-60":8,"61-240":2,">240":32} {"<5":6,"5-20":52,"21-60":38,"61-240":27,">240... {"<5":2,"5-20":61,"21-60":31,"61-240":40,">240... {"<5":12,"5-20":167,"21-60":92,"61-240":62,">2... {"<5":5,"5-20":95,"21-60":48,"61-240":30,">240... {"<5":1,"5-20":51,"21-60":56,"61-240":50,">240... {"<5":3,"5-20":48,"21-60":55,"61-240":38,">240... {"<5":0,"5-20":4,"21-60":1,"61-240":2,">240":4} {"<5":32,"5-20":934,"21-60":751,"61-240":353,"... ... {"<5":6,"5-20":101,"21-60":69,"61-240":33,">24... {"<5":27,"5-20":559,"21-60":670,"61-240":804,"... {"<5":3,"5-20":30,"21-60":22,"61-240":19,">240... {"<5":11,"5-20":82,"21-60":45,"61-240":34,">24... {"<5":8,"5-20":46,"21-60":13,"61-240":17,">240... {"<5":12,"5-20":195,"21-60":63,"61-240":24,">2... {"<5":9,"5-20":135,"21-60":0,"61-240":3,">240"... {"<5":0,"5-20":4,"21-60":2,"61-240":5,">240":24} {"<5":9,"5-20":89,"21-60":23,"61-240":13,">240... {"<5":14,"5-20":99,"21-60":57,"61-240":51,">24...
related_same_day_brand {"Subway":18,"In-N-Out Burger":17,"Shell Oil":... {} {"76":33,"Staples":16,"Denny's":11} {"Safeway":25,"Project Juice":20,"Lucky Superm... {"Tacolicious":14,"Peet's Coffee and Tea":14,"... {"Westin Hotels & Resorts":25,"7-Eleven":21,"R... {"Subway":11} {"Texaco":11} {} {"Benihana":16,"DAISO":15,"Safeway Pharmacy":8... ... {"Starbucks":21,"The Cheesecake Factory":17,"D... {"Philz Coffee":12,"Starbucks":11,"Buckhorn Gr... {"Whole Foods Market":25} {"Flyers Energy":20,"76":16,"DAISO":14,"Shell ... {"Crunch":33,"ARCO":33,"Costco Wholesale Corp.... {"Safeway":12,"CrossFit":9,"Jack in the Box":8... {"Hillstone Restaurant Group":25,"Lexus":14,"W... {} {"Shell Oil":13} {"Ram":8}
related_same_month_brand {"Starbucks":40,"Subway":32,"Chevron":24,"Targ... {"Target":42,"Starbucks":37,"Walgreens":25,"Ch... {"Starbucks":52,"Chevron":33,"Peet's Coffee an... {"Starbucks":34,"Peet's Coffee and Tea":23,"Wa... {"Starbucks":61,"Walgreens":32,"Target":28,"Ch... {"Chevron":43,"Starbucks":40,"Subway":34,"Shel... {"Starbucks":41,"Target":20,"Subway":19,"Safew... {"Starbucks":46,"Target":34,"Chevron":32,"Subw... {"Starbucks":43,"Peet's Coffee and Tea":37,"Ta... {"Target":43,"Starbucks":41,"Costco Wholesale ... ... {"Starbucks":44,"Chevron":32,"Shell Oil":31,"T... {"Starbucks":52,"Target":28,"Costco Wholesale ... {"Starbucks":40,"Target":24,"Chevron":21,"Peet... {"Starbucks":44,"Chevron":31,"Costco Wholesale... {"Starbucks":45,"Chevron":34,"Subway":27,"Walg... {"Target":36,"Costco Wholesale Corp.":34,"Chev... {"Walgreens":37,"Target":36,"Safeway":33,"Pure... {"Starbucks":14,"Whole Foods Market":12,"76":1... {"Target":35,"Starbucks":27,"Costco Wholesale ... {"Safeway":23,"Costco Wholesale Corp.":21,"Tar...
popularity_by_hour [62,51,13,3,4,4,4,4,5,6,5,9,10,14,21,20,19,14,... [33,33,32,32,33,32,32,30,18,7,6,4,3,1,0,0,0,7,... [4,4,4,4,4,4,7,8,11,11,18,16,14,16,19,23,22,11... [36,35,35,37,37,37,42,47,46,38,49,53,56,52,38,... [19,17,19,19,19,18,20,23,33,46,49,60,67,53,53,... [29,30,32,32,32,25,25,29,27,31,34,34,41,42,36,... [16,15,9,7,7,7,10,9,15,17,14,17,12,13,15,26,22... [3,6,2,2,0,0,0,0,6,5,11,19,23,27,25,18,18,13,2... [3,3,3,3,3,3,3,3,4,3,3,3,3,2,2,2,2,0,1,1,5,5,6,5] [33,40,40,28,23,26,25,42,65,87,173,309,399,451... ... [85,88,90,90,90,90,88,77,70,55,36,21,28,24,33,... [124,126,120,121,120,118,151,179,218,208,255,3... [23,23,24,26,25,24,25,26,27,28,24,28,32,36,35,... [35,35,38,38,42,47,43,44,48,38,51,53,47,43,35,... [8,7,8,7,8,9,8,11,21,17,19,29,38,34,32,32,25,7... [2,2,1,2,2,2,5,18,28,37,57,41,33,27,32,34,36,2... [17,20,19,19,19,20,23,22,30,33,36,43,47,40,38,... [21,21,21,23,22,22,22,22,22,24,23,22,20,17,13,... [21,19,16,17,14,17,20,22,23,12,10,11,14,14,24,... [51,46,44,46,43,42,47,52,59,49,49,43,36,28,31,...
popularity_by_day {"Monday":7,"Tuesday":7,"Wednesday":9,"Thursda... {"Monday":8,"Tuesday":9,"Wednesday":8,"Thursda... {"Monday":5,"Tuesday":18,"Wednesday":20,"Thurs... {"Monday":19,"Tuesday":14,"Wednesday":17,"Thur... {"Monday":41,"Tuesday":47,"Wednesday":46,"Thur... {"Monday":30,"Tuesday":26,"Wednesday":43,"Thur... {"Monday":18,"Tuesday":25,"Wednesday":16,"Thur... {"Monday":22,"Tuesday":13,"Wednesday":14,"Thur... {"Monday":1,"Tuesday":0,"Wednesday":3,"Thursda... {"Monday":219,"Tuesday":231,"Wednesday":238,"T... ... {"Monday":40,"Tuesday":44,"Wednesday":27,"Thur... {"Monday":281,"Tuesday":322,"Wednesday":351,"T... {"Monday":16,"Tuesday":11,"Wednesday":16,"Thur... {"Monday":26,"Tuesday":23,"Wednesday":22,"Thur... {"Monday":25,"Tuesday":13,"Wednesday":22,"Thur... {"Monday":30,"Tuesday":44,"Wednesday":36,"Thur... {"Monday":18,"Tuesday":22,"Wednesday":30,"Thur... {"Monday":6,"Tuesday":2,"Wednesday":5,"Thursda... {"Monday":20,"Tuesday":24,"Wednesday":20,"Thur... {"Monday":29,"Tuesday":27,"Wednesday":36,"Thur...
device_type {"android":45,"ios":101} {"android":8,"ios":15} {"android":40,"ios":70} {"android":45,"ios":62} {"android":53,"ios":171} {"android":56,"ios":77} {"android":38,"ios":116} {"android":34,"ios":95} {"android":0,"ios":7} {"android":771,"ios":965} ... {"android":70,"ios":121} {"android":532,"ios":1010} {"android":18,"ios":44} {"android":47,"ios":92} {"android":27,"ios":54} {"android":102,"ios":133} {"android":74,"ios":70} {"android":13,"ios":11} {"android":40,"ios":80} {"android":46,"ios":72}

23 rows × 535 columns

Initially cleaning removes data columns that are either not relevant to current analysis are are too sparsely recorded to be of use.

In [8]:
clean_patterns = patterns.drop(columns=['city', 'state', 'date_range_start', 'date_range_end', 'visitor_home_cbgs', 'visitor_work_cbgs', 'visitor_country_of_origin', 'related_same_day_brand', 'related_same_month_brand', 'device_type'])
In [9]:
clean_patterns_core = pd.merge(clean_patterns, core[['safegraph_place_id', 'open_hours']], left_on='safegraph_place_id', right_on='safegraph_place_id', how = 'left')

Dropping incorrectly included night club 'The cafe' which appears to have been incorrectly included in the data set of cafes.

In [10]:
clean_patterns_core = clean_patterns_core.drop(0)

Storing this data to a csv file for tableau analysis

In [11]:
clean_patterns_core.to_csv('clean_patterns_core.csv')

Main dataframe

In [12]:
clean_patterns_core.T
Out[12]:
1 2 3 4 5 6 7 8 9 10 ... 525 526 527 528 529 530 531 532 533 534
safegraph_place_id sg:118c95bf3c4e4370920d7a026fd3ebbe sg:d9f425fc18a24414b05a053442d1e408 sg:de408f31b2cd4eb783cc5de4ad6ed651 sg:fcc6a71553c04140863220456ff2f4e6 sg:a8248377476f4f40ac9eeaef65279570 sg:58be0fc40bc54869aad901c0a58458f7 sg:521f82bd4a1c40d4b672f0fb9330d052 sg:3f5759808277493ca55953724d4d6a21 sg:4c2cb694bb594c839459344674fcad1d sg:094b444cc4e247d691893d17dcafbd9e ... sg:2b2e8d4dce0340cdbf0214f83efc0744 sg:164441533dff4d1491c1b3314e8e0095 sg:cafcf3d5150c4316bcaa47fc8cfe1516 sg:44dc11faf62b44958a1522501f29e978 sg:f2baba072da4419a874f6c63d42897b4 sg:889ebdb0bc654fd19ea41ff5966b3c8c sg:13e339c34a3647da86d10b5205ca4c1f sg:3157d7aaa1db48448d40e6572884c1a3 sg:97c1bb1cbf5e4ff28860222a7490670d sg:efce8278589840149aa7f340f80f7640
location_name Royal Ground Coffee Chestnut Street Coffee Roastery Wholesome Bakery Project Juice Happy Donuts Sightglass Coffee Muddy Waters Coffee House Reveille Coffee Co Truck Chocolate Chair Cantata Coffee Company ... Latte Express The Posh Bagel Karma Cafe Cafe Murano Womply Cafe Claddagh Coffee Starbucks 3 19 Coffee Xpresso Tea Java Source Market
street_address 5301 geary boulevard 2331 chestnut street 299 divisadero street 2234 chestnut street 2101 van ness avenue 3014 twentieth street 521 valencia street 768 sansome street 1520 webster street 1708 haight street ... 448 beach street 4 embarcadero center 1901 hayes street 1777 steiner street 180 townsend street 951 geneva avenue 3735 buchanan 3359 26th street 6901 geary boulevard 500 second avenue
zip_code 94121 94123 94117 94123 94109 94110 94110 94111 94115 94117 ... 94133 94111 94117 94115 94107 94112 94123 94110 94121 94118
brands NaN NaN NaN Project Juice NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Starbucks NaN NaN NaN
raw_visit_counts 50 129 180 361 216 172 148 11 2201 322 ... 301 2369 105 224 108 305 183 35 155 277
raw_visitor_counts 23 110 108 224 134 154 129 8 1737 193 ... 191 1544 62 139 81 234 144 24 120 117
visits_by_day [1,2,1,3,1,0,2,1,3,0,1,0,5,2,1,0,0,1,2,1,0,1,1... [2,0,3,5,3,5,5,7,6,7,4,1,3,4,4,3,9,1,1,3,6,7,2... [7,5,5,13,4,3,3,6,5,9,10,3,4,2,4,7,6,11,7,6,5,... [8,11,11,8,7,14,10,11,8,18,18,11,10,8,5,12,10,... [7,8,4,4,4,8,12,10,6,5,6,4,3,4,4,7,2,6,8,7,10,... [5,3,9,8,3,7,1,8,7,3,9,2,7,4,2,3,14,3,4,7,4,4,... [2,3,8,7,5,3,4,4,5,6,3,2,2,2,7,4,10,4,10,6,4,2... [0,0,1,2,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0... [52,60,136,124,58,59,64,54,79,110,107,66,65,38... [9,8,7,10,14,11,7,6,9,16,9,12,12,9,11,14,8,13,... ... [9,6,15,6,15,11,4,7,14,13,18,12,15,6,8,8,10,11... [74,106,62,54,52,77,98,95,78,71,47,82,86,101,9... [1,1,1,2,2,2,1,1,3,3,3,4,0,6,8,3,6,3,8,5,5,4,4... [9,4,4,7,6,6,2,5,10,10,7,1,8,5,5,7,5,12,10,5,6... [3,3,1,1,7,4,5,5,3,4,5,6,4,7,3,3,1,1,8,2,4,1,5... [8,14,12,11,9,11,7,7,7,9,7,7,8,10,9,11,12,8,7,... [2,7,7,8,3,4,8,3,4,8,11,5,7,5,1,4,5,7,6,7,8,2,... [1,2,1,2,2,0,1,0,0,2,1,2,0,1,1,1,2,2,0,2,0,2,1... [3,1,4,3,4,2,4,6,7,5,7,3,6,2,6,6,6,5,2,7,10,6,... [7,12,2,19,8,5,8,12,8,7,17,10,6,8,7,3,3,11,7,9...
distance_from_home 2082 3687 4156 7893 12043 6929 7368 10081 24819 5359 ... 723499 31389 5677 7713 24248 2064 10117 4360 4629 7162
median_dwell 683.5 22 53 21 22 39 39.5 87 22 27.5 ... 41 56 47 30.5 20.5 12 10 768 10 38
bucketed_dwell_times {"<5":0,"5-20":8,"21-60":8,"61-240":2,">240":32} {"<5":6,"5-20":52,"21-60":38,"61-240":27,">240... {"<5":2,"5-20":61,"21-60":31,"61-240":40,">240... {"<5":12,"5-20":167,"21-60":92,"61-240":62,">2... {"<5":5,"5-20":95,"21-60":48,"61-240":30,">240... {"<5":1,"5-20":51,"21-60":56,"61-240":50,">240... {"<5":3,"5-20":48,"21-60":55,"61-240":38,">240... {"<5":0,"5-20":4,"21-60":1,"61-240":2,">240":4} {"<5":32,"5-20":934,"21-60":751,"61-240":353,"... {"<5":8,"5-20":127,"21-60":73,"61-240":55,">24... ... {"<5":6,"5-20":101,"21-60":69,"61-240":33,">24... {"<5":27,"5-20":559,"21-60":670,"61-240":804,"... {"<5":3,"5-20":30,"21-60":22,"61-240":19,">240... {"<5":11,"5-20":82,"21-60":45,"61-240":34,">24... {"<5":8,"5-20":46,"21-60":13,"61-240":17,">240... {"<5":12,"5-20":195,"21-60":63,"61-240":24,">2... {"<5":9,"5-20":135,"21-60":0,"61-240":3,">240"... {"<5":0,"5-20":4,"21-60":2,"61-240":5,">240":24} {"<5":9,"5-20":89,"21-60":23,"61-240":13,">240... {"<5":14,"5-20":99,"21-60":57,"61-240":51,">24...
popularity_by_hour [33,33,32,32,33,32,32,30,18,7,6,4,3,1,0,0,0,7,... [4,4,4,4,4,4,7,8,11,11,18,16,14,16,19,23,22,11... [36,35,35,37,37,37,42,47,46,38,49,53,56,52,38,... [19,17,19,19,19,18,20,23,33,46,49,60,67,53,53,... [29,30,32,32,32,25,25,29,27,31,34,34,41,42,36,... [16,15,9,7,7,7,10,9,15,17,14,17,12,13,15,26,22... [3,6,2,2,0,0,0,0,6,5,11,19,23,27,25,18,18,13,2... [3,3,3,3,3,3,3,3,4,3,3,3,3,2,2,2,2,0,1,1,5,5,6,5] [33,40,40,28,23,26,25,42,65,87,173,309,399,451... [45,48,49,47,49,47,50,62,40,54,63,66,82,62,74,... ... [85,88,90,90,90,90,88,77,70,55,36,21,28,24,33,... [124,126,120,121,120,118,151,179,218,208,255,3... [23,23,24,26,25,24,25,26,27,28,24,28,32,36,35,... [35,35,38,38,42,47,43,44,48,38,51,53,47,43,35,... [8,7,8,7,8,9,8,11,21,17,19,29,38,34,32,32,25,7... [2,2,1,2,2,2,5,18,28,37,57,41,33,27,32,34,36,2... [17,20,19,19,19,20,23,22,30,33,36,43,47,40,38,... [21,21,21,23,22,22,22,22,22,24,23,22,20,17,13,... [21,19,16,17,14,17,20,22,23,12,10,11,14,14,24,... [51,46,44,46,43,42,47,52,59,49,49,43,36,28,31,...
popularity_by_day {"Monday":8,"Tuesday":9,"Wednesday":8,"Thursda... {"Monday":5,"Tuesday":18,"Wednesday":20,"Thurs... {"Monday":19,"Tuesday":14,"Wednesday":17,"Thur... {"Monday":41,"Tuesday":47,"Wednesday":46,"Thur... {"Monday":30,"Tuesday":26,"Wednesday":43,"Thur... {"Monday":18,"Tuesday":25,"Wednesday":16,"Thur... {"Monday":22,"Tuesday":13,"Wednesday":14,"Thur... {"Monday":1,"Tuesday":0,"Wednesday":3,"Thursda... {"Monday":219,"Tuesday":231,"Wednesday":238,"T... {"Monday":46,"Tuesday":47,"Wednesday":35,"Thur... ... {"Monday":40,"Tuesday":44,"Wednesday":27,"Thur... {"Monday":281,"Tuesday":322,"Wednesday":351,"T... {"Monday":16,"Tuesday":11,"Wednesday":16,"Thur... {"Monday":26,"Tuesday":23,"Wednesday":22,"Thur... {"Monday":25,"Tuesday":13,"Wednesday":22,"Thur... {"Monday":30,"Tuesday":44,"Wednesday":36,"Thur... {"Monday":18,"Tuesday":22,"Wednesday":30,"Thur... {"Monday":6,"Tuesday":2,"Wednesday":5,"Thursda... {"Monday":20,"Tuesday":24,"Wednesday":20,"Thur... {"Monday":29,"Tuesday":27,"Wednesday":36,"Thur...
open_hours { "Mon": [["6:30", "22:00"]], "Tue": [["6:30",... NaN { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... { "Mon": [], "Tue": [], "Wed": [], "Thu": [], ... { "Mon": [["5:30", "20:00"]], "Tue": [["5:30",... { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... { "Mon": [["6:00", "23:00"]], "Tue": [["6:00",... { "Mon": [["7:30", "15:00"]], "Tue": [["7:30",... { "Mon": [["10:00", "21:00"]], "Tue": [["10:00... { "Mon": [["7:30", "20:00"]], "Tue": [["7:30",... ... NaN { "Mon": [["6:00", "18:00"]], "Tue": [["6:00",... NaN { "Mon": [["6:30", "17:00"]], "Tue": [["6:30",... { "Mon": [], "Tue": [["4:00", "4:30"]], "Wed":... { "Mon": [["6:00", "19:00"]], "Tue": [["6:00",... { "Mon": [["5:00", "17:30"]], "Tue": [["5:00",... { "Mon": [["8:00", "14:00"]], "Tue": [["8:00",... { "Mon": [["11:00", "22:00"]], "Tue": [["11:00... NaN

14 rows × 534 columns

Some basic analysis of out data

There are total 534 cafes in San Francisco out of which 18.2% are chain cafes such as Starbucks, Jamba, etc and rest of them are non-chain cafes.

In [13]:
Image(filename='Basic_analysis.JPG', width=600)
Out[13]:

Further analyzing the chain cafes, we see that approx. 70% of them are Starbucks.

In [14]:
Image(filename='Brand_Distribution.png', width=600)
Out[14]:

Plotting the geographic extent of our data (All cafes) on the map using Tableau and the shapefiles included with the dataset and seen later in the analysis.

In [15]:
Image(filename='geographic_extent.png', width=1000)
Out[15]:


Plotting only the chain brands on the map to analyze how they're located throughtout the city. We found out that 90% of the chain stores are located in the small radius around the downtown district.

In [16]:
Image(filename='geographic_extent_only_chain.png', width=1000)
Out[16]:

Popularity by hour

The data set includes data for each business, representing the foot-traffic in that business during each hour of the day. This data is stored in list objects with 24 values - one for each hour of the day. Extracting this and cleaning it to only show information for the hours that the business shows to be open.

In [17]:
popularity_hour = clean_patterns_core[['safegraph_place_id', 'location_name', 'brands',  'popularity_by_hour', 'open_hours']]

Checking for % of null values

In [18]:
len(popularity_hour[popularity_hour.open_hours.isna()])/len(popularity_hour)
Out[18]:
0.2846441947565543

Only considering the not null values

In [19]:
popularity_hour = popularity_hour[pd.notna(popularity_hour['open_hours'])]

Creating functions for extracting the start and end time

In [20]:
def clean_starttime(x):
    if x:
        return int(x[0][0].split(':')[0])
    else:
        return float('NaN')
In [21]:
def clean_endtime(x):
    if x:
        return int(x[0][1].split(':')[0])
    else:
        return float('NaN')

For the sake of analysis, Wednesday will be used to represent a typical day. The open hours for Wednesday will be used for visualization purposes.

In [22]:
popularity_hour['start_time']=popularity_hour.open_hours.apply(lambda x : clean_starttime(json.loads(x)['Wed']))
popularity_hour['end_time']=popularity_hour.open_hours.apply(lambda x : clean_endtime(json.loads(x)['Wed']))
In [23]:
popularity_hour = popularity_hour[popularity_hour.start_time.notna()]
In [24]:
popularity_hour.head().T
Out[24]:
1 3 5 6 7
safegraph_place_id sg:118c95bf3c4e4370920d7a026fd3ebbe sg:de408f31b2cd4eb783cc5de4ad6ed651 sg:a8248377476f4f40ac9eeaef65279570 sg:58be0fc40bc54869aad901c0a58458f7 sg:521f82bd4a1c40d4b672f0fb9330d052
location_name Royal Ground Coffee Wholesome Bakery Happy Donuts Sightglass Coffee Muddy Waters Coffee House
brands NaN NaN NaN NaN NaN
popularity_by_hour [33,33,32,32,33,32,32,30,18,7,6,4,3,1,0,0,0,7,... [36,35,35,37,37,37,42,47,46,38,49,53,56,52,38,... [29,30,32,32,32,25,25,29,27,31,34,34,41,42,36,... [16,15,9,7,7,7,10,9,15,17,14,17,12,13,15,26,22... [3,6,2,2,0,0,0,0,6,5,11,19,23,27,25,18,18,13,2...
open_hours { "Mon": [["6:30", "22:00"]], "Tue": [["6:30",... { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... { "Mon": [["5:30", "20:00"]], "Tue": [["5:30",... { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... { "Mon": [["6:00", "23:00"]], "Tue": [["6:00",...
start_time 6 7 5 7 6
end_time 22 19 20 19 23

A plot showing the number of business opening at each hour of the day.

In [76]:
ph = popularity_hour.start_time.plot.hist(bins=24, width=1)
ph.set_xlabel('Hour of day')
ph.set_ylabel('Count of cafes')
ph.set_title('Common start time hours')
Out[76]:
Text(0.5, 1.0, 'Common start time hours')
Out[76]:

A similar plot done on tableau to find the number of cafes opening at each hour of the day. Most common time for cafes to open is 6:00-8:00 AM

In [26]:
popularity_hour.to_csv('start_end_time.csv')
In [27]:
Image(filename='start_time.JPG', width=500)
Out[27]:

A plot done on tableau to find the number of cafes closing at each hour of the day. Most common time for cafes to close is 6:00-8:00 pm.

In [28]:
Image(filename='end_time.JPG', width=500)
Out[28]:

Now we find the average popularity of each cafe throughout the day and then plot it using mapbox and javascript

In [29]:
popularity_hour_list = popularity_hour.values.tolist()
In [31]:
for x1 in popularity_hour_list:
    x1[3] = [int(x1[3].strip('][').split(",")[i]) if (i+1) > x1[5] and (i+1) < x1[6] else 0 for i in range(len(x1[3].split(",")))]
In [32]:
popularity_hour_dataframe = pd.DataFrame(popularity_hour_list)
In [34]:
popularity_hour_dataframe = popularity_hour_dataframe.drop(columns=[4,5,6])
In [35]:
popularity_hour_dataframe.head()
Out[35]:
0 1 2 3
0 sg:118c95bf3c4e4370920d7a026fd3ebbe Royal Ground Coffee NaN [0, 0, 0, 0, 0, 0, 32, 30, 18, 7, 6, 4, 3, 1, ...
1 sg:de408f31b2cd4eb783cc5de4ad6ed651 Wholesome Bakery NaN [0, 0, 0, 0, 0, 0, 0, 47, 46, 38, 49, 53, 56, ...
2 sg:a8248377476f4f40ac9eeaef65279570 Happy Donuts NaN [0, 0, 0, 0, 0, 25, 25, 29, 27, 31, 34, 34, 41...
3 sg:58be0fc40bc54869aad901c0a58458f7 Sightglass Coffee NaN [0, 0, 0, 0, 0, 0, 0, 9, 15, 17, 14, 17, 12, 1...
4 sg:521f82bd4a1c40d4b672f0fb9330d052 Muddy Waters Coffee House NaN [0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 11, 19, 23, 27,...
In [36]:
popularity_hour_dataframe = popularity_hour_dataframe.rename(columns={0: "safegraph_", 1: "Name", 2:"Brands", 3: "Hours"})
In [37]:
for i in range(24):
    popularity_hour_dataframe[str(i+1)] = popularity_hour_dataframe['Hours'].apply(lambda x:x[i])
In [38]:
popularity_hour_dataframe = popularity_hour_dataframe.drop(columns=['Hours'])

A simplified version of the traffic data. Now there is a column for each hour of the day with a value for each business.

In [39]:
popularity_hour_dataframe.head()
Out[39]:
safegraph_ Name Brands 1 2 3 4 5 6 7 ... 15 16 17 18 19 20 21 22 23 24
0 sg:118c95bf3c4e4370920d7a026fd3ebbe Royal Ground Coffee NaN 0 0 0 0 0 0 32 ... 0 0 0 7 19 28 26 0 0 0
1 sg:de408f31b2cd4eb783cc5de4ad6ed651 Wholesome Bakery NaN 0 0 0 0 0 0 0 ... 38 31 21 21 0 0 0 0 0 0
2 sg:a8248377476f4f40ac9eeaef65279570 Happy Donuts NaN 0 0 0 0 0 25 25 ... 36 37 30 19 30 0 0 0 0 0
3 sg:58be0fc40bc54869aad901c0a58458f7 Sightglass Coffee NaN 0 0 0 0 0 0 0 ... 15 26 22 15 0 0 0 0 0 0
4 sg:521f82bd4a1c40d4b672f0fb9330d052 Muddy Waters Coffee House NaN 0 0 0 0 0 0 0 ... 25 18 18 13 24 30 24 11 0 0

5 rows × 27 columns

The third and final component of the dataset is a set of shape files, showing the outline of each of the business. Instead of the outlines, a centroid will be more useful, allowing for a bubble map of locations.

In [40]:
with fiona.open('Data/CA-722515-GEOMETRY-2019-09-27.shp') as src:
    meta = src.meta
    meta['schema']['geometry'] = 'Point'
    with fiona.open('Data/centroids.shp', 'w', **meta) as dst:
        for f in src:
            centroid = shape(f['geometry']).centroid
            f['geometry'] = mapping(centroid)
            dst.write(f)
In [41]:
centroids= gpd.read_file('Data/centroids.shp')

The centroid values are merged with the hourly traffic data, producing a product nearly ready to map.

In [43]:
centroid_hours = centroids.merge(popularity_hour_dataframe, on='safegraph_')

A geojson file will serve well for mapping purposes

In [45]:
centroid_hours.to_file("centroid_hours3.geojson", driver='GeoJSON')

This screen capture shows the location of the cafes with the small blue dots. The hour represented is midnight, so almost all cafes are closed showing no business.

In [46]:
Image(filename='Midnight.png', width=1000)
Out[46]:

In contrast, this screen capture represents mid-day. Now nearly all cafes are open and are colored and sized relative to their number of visitors.

In [47]:
Image(filename='Midday.png', width=1000)
Out[47]:

Analyzing the traffic near the cafes when they're closed

In [48]:
popularity_hour.head()
Out[48]:
safegraph_place_id location_name brands popularity_by_hour open_hours start_time end_time
1 sg:118c95bf3c4e4370920d7a026fd3ebbe Royal Ground Coffee NaN [33,33,32,32,33,32,32,30,18,7,6,4,3,1,0,0,0,7,... { "Mon": [["6:30", "22:00"]], "Tue": [["6:30",... 6.0 22.0
3 sg:de408f31b2cd4eb783cc5de4ad6ed651 Wholesome Bakery NaN [36,35,35,37,37,37,42,47,46,38,49,53,56,52,38,... { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... 7.0 19.0
5 sg:a8248377476f4f40ac9eeaef65279570 Happy Donuts NaN [29,30,32,32,32,25,25,29,27,31,34,34,41,42,36,... { "Mon": [["5:30", "20:00"]], "Tue": [["5:30",... 5.0 20.0
6 sg:58be0fc40bc54869aad901c0a58458f7 Sightglass Coffee NaN [16,15,9,7,7,7,10,9,15,17,14,17,12,13,15,26,22... { "Mon": [["7:00", "19:00"]], "Tue": [["7:00",... 7.0 19.0
7 sg:521f82bd4a1c40d4b672f0fb9330d052 Muddy Waters Coffee House NaN [3,6,2,2,0,0,0,0,6,5,11,19,23,27,25,18,18,13,2... { "Mon": [["6:00", "23:00"]], "Tue": [["6:00",... 6.0 23.0
In [49]:
popularity_hour_closed_list = popularity_hour.values.tolist()
In [51]:
for x1 in popularity_hour_closed_list:
    x1[3] = [int(x1[3].strip('][').split(",")[i]) if (i+1) < x1[5] or (i+1) > x1[6] else 0 for i in range(len(x1[3].split(",")))]
In [52]:
popularity_hour_closed_dataframe = pd.DataFrame(popularity_hour_closed_list)
In [53]:
popularity_hour_closed_dataframe = popularity_hour_closed_dataframe.drop(columns=[4,5,6])
In [54]:
popularity_hour_closed_dataframe.head()
Out[54]:
0 1 2 3
0 sg:118c95bf3c4e4370920d7a026fd3ebbe Royal Ground Coffee NaN [33, 33, 32, 32, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0...
1 sg:de408f31b2cd4eb783cc5de4ad6ed651 Wholesome Bakery NaN [36, 35, 35, 37, 37, 37, 0, 0, 0, 0, 0, 0, 0, ...
2 sg:a8248377476f4f40ac9eeaef65279570 Happy Donuts NaN [29, 30, 32, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
3 sg:58be0fc40bc54869aad901c0a58458f7 Sightglass Coffee NaN [16, 15, 9, 7, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0...
4 sg:521f82bd4a1c40d4b672f0fb9330d052 Muddy Waters Coffee House NaN [3, 6, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
In [55]:
popularity_hour_closed_dataframe = popularity_hour_closed_dataframe.rename(columns={0: "safegraph_", 1: "Name", 2: "Brands", 3: "Hours"})
In [56]:
for i in range(24):
    popularity_hour_closed_dataframe[str(i+1)] = popularity_hour_closed_dataframe['Hours'].apply(lambda x:x[i])
In [57]:
popularity_hour_closed_dataframe = popularity_hour_closed_dataframe.drop(columns=['Hours'])
In [58]:
popularity_hour_closed_dataframe.head()
Out[58]:
safegraph_ Name Brands 1 2 3 4 5 6 7 ... 15 16 17 18 19 20 21 22 23 24
0 sg:118c95bf3c4e4370920d7a026fd3ebbe Royal Ground Coffee NaN 33 33 32 32 33 0 0 ... 0 0 0 0 0 0 0 0 37 35
1 sg:de408f31b2cd4eb783cc5de4ad6ed651 Wholesome Bakery NaN 36 35 35 37 37 37 0 ... 0 0 0 0 0 39 39 37 37 39
2 sg:a8248377476f4f40ac9eeaef65279570 Happy Donuts NaN 29 30 32 32 0 0 0 ... 0 0 0 0 0 0 29 27 32 31
3 sg:58be0fc40bc54869aad901c0a58458f7 Sightglass Coffee NaN 16 15 9 7 7 7 0 ... 0 0 0 0 0 43 39 33 32 19
4 sg:521f82bd4a1c40d4b672f0fb9330d052 Muddy Waters Coffee House NaN 3 6 2 2 0 0 0 ... 0 0 0 0 0 0 0 0 0 6

5 rows × 27 columns

In [59]:
centroid_closed_hours = centroids.merge(popularity_hour_closed_dataframe, on='safegraph_')
In [60]:
centroid_closed_hours.to_file("centroid_closed_hours3.geojson", driver='GeoJSON')

As we have the open and closed time, and traffic around the cafes both when they're open and closed, we can use that to find places which might be possible locations to find a new cafe location.

The data collection method is picking up traffic (represented in red) at most of the locations, even when they are shown to be closed. This suggests that in some fashion there is still foot traffic near these locations, potentially traffic that could be taken advantage of for business purposes.

A live version of an animated visualization can be seen at the below link. Detailed exploration of this tool could reveal numerous potential sites.

https://htmlpreview.github.io/?https://github.com/darshitpd/San-Francisco-Cafe-Data-Analysis/blob/master/Web-animations/san_fran_cafe_all.html

In [68]:
Image(filename='1am.png', width=800)
Out[68]:
In [69]:
Image(filename='2pm.png', width=800)
Out[69]:

Mission and 8th St Intersection

By viewing the data in this way, it is possible to identify potential locations where traffic is high, even though shops are closed. One particular location that demonstrated this idea is near the Mission and 8th St Intersection. There is a bakery in the area that shows high traffic during all of its open hours, but it closes particularly early - approximately 4pm. Even after it is closed, the location and other surrounding locations show steady rates of human traffic, suggesting the possibility that there is un-tapped potential in this area for further business development, particularly in the later afternoon and evening hours.

In [70]:
Image(filename='Copy of Project 1 Final Presentation.png', width=1000)
Out[70]:

Further analysis

20 Minutes is the most common customer dwell time for chain stores

With all median customer dwell times in chains topping out at 90 minutes. Assuming that chain stores have done extensive market research for profitability, the 20-40 minute time frame appears to be the ideal time for customer dwell.

In [71]:
Image(filename='Dwell_Chain.png', width=800)
Out[71]:
All dwell times longer than 90 minutes are found at non-chain stores

Non-chain stores have clustered dwells at 20 and 40, but also have many instances of longer median dwell times. These long dwell stores also seem to have very small numbers of customers, suggesting that these shops are not thriving to the extent of the stores with faster customer turn-over.

In [77]:
Image(filename='Dwell2.png', width=800)
Out[77]:

Stategies

If a shop is committed to the idea of facilitating long customer experiences, one potential approach would be to follow the example of this Brooklyn shop which charges customers not by the item, but by the minute above 1 hour (after paying a $6 base fee.)

https://www.businessinsider.com/coffee-shop-charges-for-time-spent-2016-10

In [73]:
Image(filename='Annotation 2019-10-16 143234.png', width=1000)
Out[73]:




Data Concerns

Although the data is interesting and may provide some business insight, it is necessary to also provide some caution about its use and realiability.

In [74]:
Image(filename='Jamba.png', width=1000)
Out[74]:

The above chart shows the median customer dwell time for all Jamba Juice stores. There is a high level of consistency clustered at the 20 min mark. Oddly, there is one store which shows double the time. Looking up this store's location reveals something interesting.

In [75]:
Image(filename='Annotation 2019-10-15 170550.png', width=1000)
Out[75]:

The store location is directly next to an escape room location that advertises 60min customer experiences. If indeed this store is actually giving 20min turn-around for customers, it seems likely that the 40 min times are being corrupted by data from the store next door, mixing with the Jamba location. Perhaps some of the customers shown by GPS to be at the Jamba location are actually at the adjacent business and staying longer. This is just one particularly clear example of many such potential issues in the data.

A SafeGraph employee admitted that there are flaws and reliabiltiy issues with the data

“There is always inherent noise in our visit attribution algorithm, due to jumpiness/errors in GPS data (such as GPS drift, GPS ping scattering in city environments)”

“Our visit data is collected from a panel of mobile GPS devices data. The likely reasons you would see visits outside of business hours are (1) we are picking up visits in the building which are not to the actual POI (for instance a condo over a Starbucks) or (2) we are picking up night workers at a location.”

This leads to questions about all of the data. When the data shows that there are many individuals at a location after regular business hours, are these actually potential customers, or is the collection method merely picking of the presence of people in a nearby appartment building, bus stop, or theater?

  1. Use current data to identify positive potential store locations/times in high-traffic, low-coverage areas
  1. Acquire SafeGraph data for other (non cafe) businesses in identified target areas to better model consumer flow through these neighborhoods
  1. Observe best identified sites, on-the-ground, to confirm accuracy/inaccuracy of patterns shown in the data and rule out false positives

Additionally

-Use the data to fine-tune the hours and operations of current businesses to follow best practices of competitors or take advantage of missed opportunities

-Always double-check before making a financial decision based on the data due to the bad signal to noise ratio in SafeGraph’s collection methods

End of Current Analysis

Youtube Link:

https://www.youtube.com/watch?v=hN3JWIhM7eQ&feature=youtu.be

Follow this link for a narrated, visual review of this analysis.